This week we will be looking at a key component of the data science workflow: version control. More specifically we will create your first repo on Github and see how we can use it to keep a record of all the changes that were ever made to your code. This is an essential component of collaborative coding efforts, but can also be immensely beneficial to your own solo projects.
Today’s session will follow the exact recipe suggested in Simon’s lecture:
Before we get into the nitty-gritty of this week’s session, I would like to suggest you all download Github Desktop. It is a GUI that lets you interact with GitHub and might be an additional option if your do not want to rely on RStudio or the Command Line alone. Instructions to set it up and get it running can be found here.
There are a number of different GUIs that you can try out and play around with. Some are better than others. An additional free GUI like GitHub Desktop that I like is GitKraken.
Most of you will already have completed this step (we hope). For future reference however, we included a small reminder on the necessary steps:
Register for a GitHub account.
Install Git (or update version) using the Command Line.
which git
git --version## /usr/bin/git
## git version 2.30.1 (Apple Git-130)
brew install git
brew install ghgit config --global user.name 'its-me'
git config --global user.email 'my-email@adress.eu'The following section highlights the recommended workflow that you should employ when you work with Git. You can see it as a sort of recipe that you should follow under most circumstances. At each stage you can find the instructions for working through both the Command Line and through RStudio. Be aware however that these are separate processes that should not be mixed. Either you use the shell for version control or you use RStudio.
Go to your github page and make sure you are logged in.
Click green “New repository” button. Or, if you are on your own profile page, click on “Repositories”, then click the green “New” button.
How to fill this in:
Great, now that you created a new repo on Github, it is important to note that you should always create a repo prior to starting your work in RStudio.
Question:
There is a distinct advantage to employing the GitHub first, RStudio/Shell second approach, do you know which one?
Whatever way you plan on cloning this repo, you first need to copy the URL identifying it. Luckily there is another green button “Code” that allows you to do just that. Copy the HTTPS link for now. It will look something like this https://github.com/tom-arend/my_first_repo.git.
Open the Terminal on your laptop.
Be sure to check what directory you’re in. $ pwd displays the working directory. $ cd is the command to change directory.
Clone a repo into your chosen directory.
cd ~/phd_hertie/teaching/IDS_fall_21
git clone https://github.com/tom-arend/my_first_repo.gitCheck whether it worked:
cd ~/phd_hertie/teaching/IDS_fall_21/my_first_repo
git log
git statusIn RStudio, go to:
File > New Project > Version Control > Git.
In the “repository URL”-box paste the URL of your new GitHub repository.
Do not just create some random directory for the local copy. Instead think about how you organize your files and folders and make it coherent.
I always suggest that with any new R-project you “Open in new session”.
Finally, click the “Create Project” to create a new directory. What you get are three things in one:
In the absence of other constraints, I suggest that all of your R projects have exactly this set-up.
Here is a short gif on how to clone a repo with GitHub Desktop: